منابع مشابه
Clustering scRNA-Seq Data using TF-IDF
In this abstract, we propose several computational approaches for clustering scRNA-Seq data based on the Term Frequency Inverse Document Frequency (TF-IDF) transformation that has been successfully used in the field of text analysis. Empirical evaluation on simulated cell mixtures with different levels of complexity suggests that the TF-IDF methods consistently outperform existing scRNA-Seq clu...
متن کاملComputational approaches for interpreting scRNA‐seq data
The recent developments in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high-dimensional data mining techniques. Here, we consider biological questions for which scRNA-seq data is used, both at a cell and gene level, and...
متن کاملdropClust: efficient clustering of ultra-large scRNA-seq data.
Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a numb...
متن کاملZero-Inflated Exponential Family Embeddings
Word embeddings are a widely-used tool to analyze language, and exponential family embeddings (Rudolph et al., 2016) generalize the technique to other types of data. One challenge to fitting embedding methods is sparse data, such as a document/term matrix that contains many zeros. To address this issue, practitioners typically downweight or subsample the zeros, thus focusing learning on the non...
متن کاملAdjusting for covariates in zero-inflated gamma and zero-inflated log-normal models for semicontinuous data
Semicontinuous data consist of a combination of a point-mass at zero and a positive skewed distribution. This type of non-negative data distribution is found in data from many fields, but presents unique challenges for analysis. Specifically, these data cannot be analyzed using positive distributions, but distributions that are unbounded are also likely a poor fit. Two-part models incorporate b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Biotechnology
سال: 2020
ISSN: 1087-0156,1546-1696
DOI: 10.1038/s41587-019-0379-5